智能论文笔记

Interpretability and causal discovery of the machine learning models to predict the production of CBM wells after hydraulic fracturing

Chao Min , Guoquan Wen , Liangjie Gou , Xiaogang Li , Zhaozhong Yang

分类：机器学习

2022-12-21

Machine learning approaches are widely studied in the production prediction of CBM wells after hydraulic fracturing, but merely used in practice due to the low generalization ability and the lack of interpretability. A novel methodology is proposed in this article to discover the latent causality from observed data, which is aimed at finding an indirect way to interpret the machine learning results. Based on the theory of causal discovery, a causal graph is derived with explicit input, output, treatment and confounding variables. Then, SHAP is employed to analyze the influence of the factors on the production capability, which indirectly interprets the machine learning models. The proposed method can capture the underlying nonlinear relationship between the factors and the output, which remedies the limitation of the traditional machine learning routines based on the correlation analysis of factors. The experiment on the data of CBM shows that the detected relationship between the production and the geological/engineering factors by the presented method, is coincident with the actual physical mechanism. Meanwhile, compared with traditional methods, the interpretable machine learning models have better performance in forecasting production capability, averaging 20% improvement in accuracy.

translated by 谷歌翻译

Memory-like Adaptive Modeling Multi-Agent Learning System

Xingyu Qian , Aximu Yuemaier , Longfei Liang , Wen-Chi Yang , Xiaogang Chen , Shunfen Li , Weibang Dai , Zhitang Song

分类：计算机视觉

2022-12-15

In this work, we propose a self-supervised multi-agent system, termed a memory-like adaptive modeling multi-agent learning system (MAMMALS), that realizes online learning towards behavioral pattern clustering tasks for time series. Encoding the visual behaviors as discrete time series(DTS), and training and modeling them in the multi-agent system with a bio-memory-like form. We finally implemented a fully decentralized multi-agent system design framework and completed its feasibility verification in a surveillance video application scenario on vehicle path clustering. In multi-agent learning, using learning methods designed for individual agents will typically perform poorly globally because of the behavior of ignoring the synergy between agents.

translated by 谷歌翻译

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang , Jifeng Dai , Zhe Chen , Zhenhang Huang , Zhiqi Li , Xizhou Zhu , Xiaowei Hu , Tong Lu , Lewei Lu , Hongsheng Li

分类：计算机视觉

2022-11-10

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved the new record 65.4 mAP on COCO test-dev. The code will be released at https://github.com/OpenGVLab/InternImage.

translated by 谷歌翻译

Magnetic Resonance Fingerprinting with compressed sensing and distance metric learning

Zhe Wang , Hongsheng Li , Qinwei Zhang , Jing Yuan , Xiaogang Wang

分类：计算机视觉 | 机器学习

2022-09-19

磁共振指纹（MRF）是一种新型技术，它同时估算了多个与组织相关的参数，例如纵向松弛时间T1，横向松弛时间T2，离子频率B0和质子密度，从仅在二十秒内的扫描对象，。但是，MRF方法遭受混乱的伪像，因为它明显地示例了K空间数据。在这项工作中，我们提出了一个基于MRF方法同时估算多个组织相关参数的压缩传感（CS）框架。它比低采样比更健壮，因此在估计对象所有体素的MR参数方面更有效。此外，MRF方法需要从具有L2距离的MR-Signal-Evolution词典中鉴定出最接近的查询指纹原子。但是，我们观察到L2距离并不总是是测量MR指纹之间相似性的合适度量。从不足采样的训练数据中自适应地学习距离度量，可以显着提高查询指纹的匹配精度。广泛的模拟案例的数值结果表明，就参数估计的准确性而言，我们的方法基本上优于先进方法。

translated by 谷歌翻译

Learning Degradation Representations for Image Deblurring

Dasong Li , Yi Zhang , Ka Chun Cheung , Xiaogang Wang , Hongwei Qin , Hongsheng Li

分类：计算机视觉

2022-08-10

在各种基于学习的图像恢复任务（例如图像降解和图像超分辨率）中，降解表示形式被广泛用于建模降解过程并处理复杂的降解模式。但是，在基于学习的图像deblurring中，它们的探索程度较低，因为在现实世界中挑战性的情况下，模糊内核估计不能很好地表现。我们认为，对于图像降低的降解表示形式是特别必要的，因为模糊模式通常显示出比噪声模式或高频纹理更大的变化。在本文中，我们提出了一个框架来学习模糊图像的空间自适应降解表示。提出了一种新颖的联合图像re毁和脱蓝色的学习过程，以提高降解表示的表现力。为了使学习的降解表示有效地启动和降解，我们提出了一个多尺度退化注入网络（MSDI-NET），以将它们集成到神经网络中。通过集成，MSDI-NET可以适应各种复杂的模糊模式。 GoPro和Realblur数据集上的实验表明，我们提出的具有学识渊博的退化表示形式的Deblurring框架优于最先进的方法，具有吸引人的改进。该代码在https://github.com/dasongli1/learning_degradation上发布。

translated by 谷歌翻译

Frozen CLIP Models are Efficient Video Learners

Ziyi Lin , Shijie Geng , Renrui Zhang , Peng Gao , Gerard de Melo , Xiaogang Wang , Jifeng Dai , Yu Qiao , Hongsheng Li

分类：计算机视觉

2022-08-06

视频识别是由端到端学习范式主导的 - 首先初始化具有预审预周化图像模型的视频识别模型，然后对视频进行端到端培训。这使视频网络能够受益于验证的图像模型。但是，这需要大量的计算和内存资源，以便在视频上进行填充以及直接使用预审计的图像功能的替代方案，而无需填充图像骨架会导致结果不足。幸运的是，在对比视力语言预训练（剪辑）方面的最新进展为视觉识别任务的新途径铺平了道路。这些模型在大型开放式图像文本对数据上进行了预测，以丰富的语义学习强大的视觉表示。在本文中，我们介绍了有效的视频学习（EVL） - 一种有效的框架，用于直接训练具有冷冻剪辑功能的高质量视频识别模型。具体来说，我们采用轻型变压器解码器并学习查询令牌，从剪辑图像编码器中动态收集帧级空间特征。此外，我们在每个解码器层中采用局部时间模块，以发现相邻帧及其注意力图的时间线索。我们表明，尽管有效地使用冷冻的骨干训练，但我们的模型在各种视频识别数据集上学习了高质量的视频表示。代码可在https://github.com/opengvlab/feld-video-rencognition上找到。

translated by 谷歌翻译

No Attention is Needed: Grouped Spatial-temporal Shift for Simple and Efficient Video Restorers

Dasong Li , Xiaoyu Shi , Yi Zhang , Xiaogang Wang , Hongwei Qin , Hongsheng Li

分类：计算机视觉

2022-06-22

旨在恢复降级视频清晰框架的视频修复一直在吸引越来越多的关注。需要进行视频修复来建立来自多个未对准帧的时间对应关系。为了实现这一目标，现有的深层方法通常采用复杂的网络体系结构，例如集成光流，可变形卷积，跨框或跨像素自我发项层，从而导致昂贵的计算成本。我们认为，通过适当的设计，视频修复中的时间信息利用可能会更加有效。在这项研究中，我们提出了一个简单，快速但有效的视频修复框架。我们框架的关键是分组的时空转移，它简单且轻巧，但可以隐式建立框架间的对应关系并实现多框架聚合。加上用于框架编码和解码的基本2D U-NET，这种有效的时空移位模块可以有效地应对视频修复中的挑战。广泛的实验表明，我们的框架超过了先前的最先进方法，其计算成本的43％在视频DeBlurring和Video Denoisising上。

translated by 谷歌翻译

3D Object Detection for Autonomous Driving: A Review and New Outlooks

Jiageng Mao , Shaoshuai Shi , Xiaogang Wang , Hongsheng Li

分类：计算机视觉 | 人工智能 | 机器人

2022-06-19

近年来，自主驾驶一直在受到越来越多的关注，因为它的潜力减轻了驾驶员的负担并提高驾驶的安全性。在现代的自动驾驶管道中，感知系统是必不可少的组件，旨在准确估计周围环境的状态，并为预测和计划提供可靠的观察。 3D对象检测可以智能预测自动驾驶汽车附近关键3D对象的位置，大小和类别，是感知系统的重要组成部分。本文回顾了自动驾驶的3D对象检测的进展。首先，我们介绍3D对象检测的背景，并讨论此任务中的挑战。其次，我们从模型和感觉输入的各个方面（包括基于激光雷达，基于摄像头和多模式检测方法）对3D对象检测的进度进行了全面调查。我们还对每类方法中的潜力和挑战提供了深入的分析。此外，我们系统地研究了3D对象检测在驾驶系统中的应用。最后，我们对3D对象检测方法进行了性能分析，并进一步总结了多年来的研究趋势，并向前景提供了该领域的未来方向。

translated by 谷歌翻译

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

Jinguo Zhu , Xizhou Zhu , Wenhai Wang , Xiaohua Wang , Hongsheng Li , Xiaogang Wang , Jifeng Dai

分类：计算机视觉

2022-06-09

为了构建人工神经网络，例如生物智能系统，最近的作品将许多任务统一为通才模型，该模型可以使用共享参数处理各种任务，并且没有任何特定于任务的模块。尽管通才模型在各种基准上取得了令人鼓舞的结果，但与任务特殊模型相比，它们在某些任务上具有绩效降解。在这项工作中，我们发现不同任务和方式之间的干扰是这种现象的主要因素。为了减轻这种干扰，我们将条件混合物（条件MOE）引入通才模型。建议在不同级别的条件下采用路由策略来考虑培训/推理成本和概括能力。通过合并提出的条件MOE，最近提出的通才模型Uni-Pectiver可以有效地减轻任务和方式的干扰，并通过迅速调整1％的下游数据，从而在一系列下游任务上实现最新的结果。。此外，有条件的MOE的引入仍然具有通才模型对新任务（例如视频文本检索和视频标题）进行零摄像推断的概括能力。应发布代码和预培训的通才模型。

translated by 谷歌翻译

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Xizhou Zhu , Jinguo Zhu , Hao Li , Xiaoshi Wu , Xiaogang Wang , Hongsheng Li , Xiaohua Wang , Jifeng Dai

分类：计算机视觉

2021-12-02

动物的生物智能系统通过将信息与各种任务同时整合在不同的方式和处理中的信息。相比之下，当前的机器学习研究遵循一个特定于任务的范例，导致任务与开发新任务的感知模型的高度边际成本之间的负面合作。在本文中，我们展示了一个名为Uni-Perceiver的通用感知体系结构，其处理各种模型和任务，具有统一的建模和共享参数。具体而言，UNI-Perceiver将从任意模态的不同的任务输入和目标进行编码为具有模态 - 不可变换器编码器和轻量级模式特定标记的统一表示空间。不同的感知任务被建模为相同的配方，即通过其表示的相似性找到每个输入的最大可能性目标。该模型在多个单模和多模态任务上预先培训，并在各种下游任务上进行评估，包括在预训练阶段中未出现的新任务。结果表明，我们没有任何调整的预先训练的模型即使在新的任务上也可以实现合理的性能。通过在下游任务数据的1％上进行提示调整，可以将性能提高到接近最先进的方法的水平。全数据微调进一步提供结果与最先进的结果相提并论。代码应释放。

translated by 谷歌翻译